---
title: "What Does It Take to Be Rich? Asking Reasonable Survey Questions about Income Inequality"
author: "Kris-Stella Trump"
date: June 12, 2023
header-includes:
   - \usepackage{setspace}\doublespacing
   - \setlength\parindent{24pt}
   - \usepackage{pdflscape}
   - \newcommand{\blandscape}{\begin{landscape}}
   - \newcommand{\elandscape}{\end{landscape}}
output: 
  pdf_document:
    number_sections: false
    fig_caption: yes
    keep_tex: yes
bibliography: whoisrich.bib 
abstract: "Measuring public perceptions of economic inequality is challenging. Even though the concept of unequal resources is intuitive, most mathematical summaries of inequality are not. Additionally, humans are better at thinking in terms of prototypical representations of different groups than in terms of statistical properties of distributions. As a result, asking respondents to estimate numeric indicators of unequal distributions results in high rates of missing and ad-hoc responses. To circumvent this problem, this article proposes and characterizes two survey items about income differences that refer primarily to mental representations of social groups, but that can still be used to explore respondents' perceptions of economic inequality. The survey items ask respondents to estimate the income at which a household becomes rich, and the income at which a household becomes poor. Three survey samples from two countries show that response patterns are plausible and exhibit expected correlates. These items sacrifice the existence of objectively correct numeric answers, but gain insight into respondents' subjective perceptions of the differences between the rich and the poor. Consequently, these items may improve our ability to study the correlates and determinants of lay perceptions of economic inequality."
---

```{r setup, include=FALSE, warning=F}
knitr::opts_chunk$set(echo = F)
# r version 4.2.3 

library(tidyverse)
library(ggplot2)
library(haven)
library(foreign)
library(purrr)
library(kableExtra)
library(gridExtra)

#set number of digits
options(digits=2)

#load survey data 
pilot <- read.csv("pilot.csv", na.strings="NA")
respondi <- read.csv("respondi.csv", na.strings="NA")
yougov <- read_sav("main.sav")

#load actual income distribution data
de18 <- read.csv("de18.csv")
us19 <- read.csv("us19.csv")

#winsorizing for analyses
#method 1 - reference real income distribution
#if rich threshold is higher than the 99th percentile, winsorize to 99th percentile

#method 2 (topcoded_2) - reference only respondent guesses
#recode 5% most extreme responses on top and bottom to the 5th and 95th percentile responses
pilot <- pilot %>%
  mutate(rich_topcoded = ifelse(inc_rich_clean>us19[us19$percentiles==99,"us19_quantiles"], us19[us19$percentiles==99,"us19_quantiles"], inc_rich_clean),
         rich_topcoded_2 = ifelse(inc_rich_clean>quantile(pilot$inc_rich_clean, na.rm=T, probs=0.95), quantile(pilot$inc_rich_clean, na.rm=T, probs=0.95), inc_rich_clean)
  )

respondi <- respondi %>%
  mutate(rich_topcoded = ifelse(rich_inc>de18[de18$percentile==99,"de18_quantile"], de18[de18$percentile==99,"de18_quantile"], rich_inc),
         rich_topcoded_2 = ifelse(rich_inc>quantile(respondi$rich_inc, na.rm=T, probs=0.95), quantile(respondi$rich_inc, na.rm=T, probs=0.95), rich_inc),
         poor_topcoded = ifelse(poor_inc>de18[de18$percentile==99,"de18_quantile"], de18[de18$percentile==99,"de18_quantile"], poor_inc),
         poor_topcoded_2 = ifelse(poor_inc>quantile(respondi$poor_inc, na.rm=T, probs=0.95), quantile(respondi$poor_inc, na.rm=T, probs=0.95), poor_inc),
         rich_poor_ratio = rich_topcoded/poor_topcoded
  )

yougov <- yougov %>%
  mutate(faminc_new = na_if(faminc_new,97),
         rich_topcoded = ifelse(Q1>us19[us19$percentiles==99,"us19_quantiles"], us19[us19$percentiles==99,"us19_quantiles"], Q1),
         poor_topcoded = ifelse(Q2>us19[us19$percentiles==99,"us19_quantiles"], us19[us19$percentiles==99,"us19_quantiles"], Q2),
         rich_topcoded_2 = ifelse(Q1>quantile(yougov$Q1, na.rm=T, probs=0.95), quantile(yougov$Q1, na.rm=T, probs=0.95), Q1),
         poor_topcoded_2 = ifelse(Q2>quantile(yougov$Q2, na.rm=T, probs=0.95), quantile(yougov$Q2, na.rm=T, probs=0.95), Q2),
         rich_poor_ratio = rich_topcoded/poor_topcoded
  )



#give each estimate a percentile
yougov <- yougov %>% add_column(rich_percentile = NA)
yougov <- yougov %>% add_column(poor_percentile = NA)
respondi <- respondi %>% add_column(rich_percentile = NA)
respondi <- respondi %>% add_column(poor_percentile = NA)

for (i in 1:nrow(yougov)) {
  yougov$rich_percentile[i] <- sum(yougov$Q1[i]>us19$us19_quantiles)
}

for (i in 1:nrow(yougov)) {
  yougov$poor_percentile[i] <- sum(yougov$Q2[i]>us19$us19_quantiles)
}

for (i in 1:nrow(respondi)) {
  respondi$rich_percentile[i] <- sum(respondi$rich_inc[i]>de18$de18_quantile)
}

for (i in 1:nrow(respondi)) {
  respondi$poor_percentile[i] <- sum(respondi$poor_inc[i]>de18$de18_quantile)
}


#give each estimate a decile
#first find deciles
us19_deciles <- us19[c(10,20,30,40,50,60,70,80,90),"us19_quantiles"]
de18_deciles <- de18[c(10,20,30,40,50,60,70,80,90),"de18_quantile"]

#then match as for percentiles
yougov <- yougov %>% add_column(rich_decile = NA)
yougov <- yougov %>% add_column(poor_decile = NA)
respondi <- respondi %>% add_column(rich_decile = NA)
respondi <- respondi %>% add_column(poor_decile = NA)

for (i in 1:length(yougov$rich_topcoded)) {
  yougov$rich_decile[i] <- (sum(yougov$rich_topcoded[i]>=us19_deciles))*10
}

for (i in 1:length(yougov$poor_topcoded)) {
  yougov$poor_decile[i] <- (sum(yougov$poor_topcoded[i]>=us19_deciles))*10
}

for (i in 1:length(respondi$rich_topcoded)) {
  respondi$rich_decile[i] <- (sum(respondi$rich_topcoded[i]>=de18_deciles))*10
}

for (i in 1:length(respondi$poor_topcoded)) {
  respondi$poor_decile[i] <- (sum(respondi$poor_topcoded[i]>=de18_deciles))*10
}


#and reverse question: for each percentile, what share of respondents think a person above that cutoff is rich? 
de18 %>% add_column(cumul_rich_respondi = NA)
for (i in 1:100){
  de18$cumul_rich_respondi[i] <- sum(respondi$rich_inc<=de18$de18_quantile[i], na.rm=T)/sum(is.na(respondi$rich_inc)!=1)
}
#include those who estimated over the 100% cut-off in the last percentile count
de18$cumul_rich_respondi[100] <- 1

us19 %>% add_column(cumul_rich_yougov = NA)
for (i in 1:100){
  us19$cumul_rich_yougov[i] <- sum(yougov$Q1<=us19$us19_quantile[i], na.rm=T)/sum(is.na(yougov$Q1)!=1)
}
#include those who estimated over the 100% cut-off in the last percentile count
us19$cumul_rich_yougov[100] <- 1

#for each percentile, what share of respondents think a person below that cutoff is poor? 
de18 %>% add_column(cumul_poor_respondi = NA)
for (i in 1:100){
  de18$cumul_poor_respondi[i] <- sum(respondi$poor_inc>=de18$de18_quantile[i], na.rm=T)/sum(is.na(respondi$poor_inc)!=1)
}
#include those who estimated below the 1% cut-off in the bottom percentile count
de18$cumul_poor_respondi[1] <- 1

us19 %>% add_column(cumul_poor_yougov = NA)
for (i in 1:100){
  us19$cumul_poor_yougov[i] <- sum(yougov$Q2>=us19$us19_quantile[i], na.rm=T)/sum(is.na(yougov$Q2)!=1)
    }
#include those who estimated below the 1% cut-off in the bottom percentile count
us19$cumul_poor_yougov[1] <- 1

# round the guesses for rich and poor to deciles
roundUp <- function(x,to=10)
{
  to*(x%/%to + as.logical(x%%to))
}
respondi <- respondi %>% 
  mutate(rich_share_decile = roundUp(rich_share),
         poor_share_decile = roundUp(poor_share),
         richpluspoor_decile = roundUp(richpluspoor),
         rich_share_decile = ifelse(rich_share_decile==0, 10, rich_share_decile), #to collapse "0" into implied "0 to 10" decile
         poor_share_decile = ifelse(poor_share_decile==0, 10, poor_share_decile),
         richpluspoor_decile = ifelse(richpluspoor_decile==0, 10, richpluspoor_decile)
         )

# additional wrangling
respondi <- respondi %>% 
  mutate(above_median = hh_inc > median(hh_inc, na.rm=T),
         high_educ = educ==9 | educ==10
         )

yougov <- yougov %>% 
  mutate(above_median = faminc_new > median(faminc_new, na.rm=T))

#function for finding mode
getmode <- function(v) {
   uniqv <- unique(v)
   uniqv[which.max(tabulate(match(v, uniqv)))]
}

#extract the pretty blue from colorbrewer
myblue <- c("#1F78B4")
```

\clearpage

Measuring perceptions of economic inequality is challenging. On the one hand, the idea that some have more than others is intuitive, and signs of economic differences are ubiquitous in everyday life. On the other hand, economic inequality is an abstract concept with numerous equally plausible but mathematically complex numerical representations (e.g. the Gini coefficient, the 90:10 income ratio, the top 1% wealth share, etc). Researchers studying how the public perceives inequality thus need to make consequential decisions regarding how to ask respondents about this concept.

One common approach is to select numeric representations of inequality and ask survey respondents what they think those numbers are. This approach, in effect, asks respondents to give their estimates of researchers' preferred indicators of inequality. The results are used to evaluate how accurately people perceive economic inequality and how those perceptions are related to other variables of interest. However, this approach presumes more numeric ability and awareness than most respondents are able to offer [@pedersen2019; @phillips2020; @heiserman2021]. As a consequence, analyses that rely on these items may be inaccurate [@eriksson2012; @chambers2014].

This article proposes an alternative set of survey items on income inequality: asking people what incomes qualify a household as rich or poor. These items capture an intuitive, yet numerically expressed, sense of the differences between the rich and the poor. Specifically, the ratio of the two responses measures the perceived gap between the rich and the poor. This approach reverses the usual order of operations (first choose the benchmark, then measure perceptions). Instead, these items prioritize capturing an intuitive sense of inequality, making it the job of researchers to figure out which aspects of objective inequality affect perceived inequality.

This article describes the benefits and costs of these survey items, first theoretically and then empirically. Theoretically, these questions have the benefit of being intuitive to answer. A key cost is that they lack objectively correct responses: the items refer to "the rich" and "the poor", but the social and cultural referents of these terms shift over time and space. Below, I argue that this ambiguity can be a feature and not just a bug. Turning to empirical validation, I use three survey samples from two countries to illustrate that responses are generally reasonable and have expected correlates. The concluding sections give suggestions for how these items could be integrated into research on perceptions of inequality.

# Measuring perceptions of inequality

Selecting survey items requires satisfying two motivations. First, we want respondents to be able to answer our questions. Second, we want the questions to be informative for our research agendas. The complexity of economic inequality unfortunately makes these motivations difficult to reconcile. I propose that asking "At what level of income would you say that a household becomes [rich/poor]?" is a reasonable balancing act between the two motivations. Below, I address each motivation in turn.

## Can people answer these survey questions?

The questions are designed with lessons from the literature in mind. In general, survey questions that are too difficult lead to high rates of don't knows [@shoemaker2002], ad-hoc responses [@sturgis2010a], or wrong answers that exhibit significant test-retest variation even among respondents who report being sure of their estimates [@graham2023]. In the case of economic inequality, several common questions exhibit related issues, such as high rates of logical fallacies [@heiserman2021], anchoring effects [@pedersen2019], and partition dependence [@bogard2022]. These problems affect inequality estimates derived from questions about the incomes of different occupations [@jasso2000; @osberg2006; @trump2018a; @trump2023], choices among visual representations of inequality [@niehues2014; @chambers2014; @bobzien2020], and estimated shares of wealth [@norton2011a; @eriksson2012] and incomes [@boudreau2018]. Even though the questions enumerated here have all been designed for simplicity, it nonetheless seems that we are not yet asking questions about this complex topic that respondents can intuitively answer.

One way to simplify inequality questions is to side-step numeric estimates, for example by asking whether inequality has increased or decreased [@bartels2005a]. However, depending on the research question at hand, we may need items with numeric answers, as these are easier to compare across respondents and time.

Another way to simplify the questions is to focus on features of the social world that are mentally accessible (i.e. that come to mind easily). Many inequality measures refer to extensional attributes (such as averages over a distribution), which are difficult for human beings to represent mentally [@kahneman2003]. Accessible features, instead, tend to be "prototypes", such as mental representations of a typical member of a group. This insight has been applied to questions about inequality with promising results. @chambers2014 and @eriksson2012 argue that asking about average incomes gives better results than asking about the total income of specific deciles. This approach was applied large-scale in @pontusson2020, who ask respondents to estimate the incomes of average households at the 10th and 90th percentiles. Here, I go one step further and drop the reference to income deciles. Instead, I rely on the respondents' prototypes of rich and poor households.

In sum, I expect these questions to be relatively easy to answer because they:

-   Have a concrete and mentally accessible target: prototypical perception of rich/poor households.
-   Have a concrete and mentally accessible benchmark: the respondent's own household income.
-   Do not require respondents to reason about distributions, including the concept of deciles.
-   Reference income in culturally relevant units.
-   Do not specify household size. \footnote{While it is likely that respondents intuitively think of households with different sizes, it is unlikely that they could accurately adjust for household size if prompted. Therefore, including such a prompt runs the risk of introducing more error than it removes.}

## Are these survey questions useful for research?

The proposed items measure perceived inequality: the ratio between the two responses indicates how large the gap between the rich and the poor appears to the respondent. This ratio can be used to answer questions like: Do people in unequal places perceive more inequality? Does press coverage of billionaires increase the perceived gap between the rich and the poor? Does pre- or post-tax income inequality have a larger correlation with perceived inequality?

We can think about this way of setting up our research questions by observing, as @jachimowicz2022 do, that economic inequality has many features (including geography, social group, and resource type), and that this results in a manifold of specific inequalities that researchers may want to study. While researchers can and should specify which facet of inequality they study, it will not be feasible to examine public perceptions of each such facet. Partly, this is because of researchers' resource constraints. Mainly, however, the limitation arises because these distinctions will be too obscure for most survey respondents. For example, researchers may want to ask questions such as: Are lay perceptions of inequality affected by the 90:10 income ratio at the county level? The top 1% income share nationally? The poverty rate among the respondent's racial group? The respondent may not be able to accurately cite any of these statistics. In fact, they most likely will not be able to do so [@heiserman2021]. However, they almost certainly have qualitative impressions of the gap between the haves and the have-nots, and these impressions may be influenced by any of these measures of inequality. The items proposed here seek to capture those qualitative impressions, leaving it to researchers to figure out the objective measures of inequality that most influence these intuitive perceptions.

When interpreting answers to these questions, we need to remember that because the terms "rich" and "poor" are subjective, the items do not have objectively correct answers. Additionally, the connotations that people have with these terms will shift across social and political contexts. For example, it is likely that the Occupy Movement (whose slogan "we are the 99%" referred to the runaway incomes of the top 1% of earners) increased the share of the population who associate the term "the rich" with the top 1%. It is an unavoidable feature of these items that answers to them reflect politically salient class categories.

As a consequence, we cannot assume the answers refer to specific percentiles, and research questions that call for such specificity require different operationalization. However, capturing salient class divisions can be useful for projects that seek to understand which aspects of *de facto* economic differences are most salient to respondents. For example, these items allow us to evaluate whether most respondents think of the rich as the top 10%, the top 1%, or billionaires. The answers teach us about the contours of class politics, which is what many research questions about perceived inequality are ultimately about.

# Data and Results

```{r setup rounding for in-text numbers 1, include=F}
options(digits=1)
```

The full wording of the proposed items is:

-   "At what level of income would you say that a household becomes rich? In other words, how much money does a family need to make [per year] for you to consider them rich?"

-   "At what level of income would you say that a household becomes poor? In other words, how little money does a family need to make [per year] for you to consider them poor?"

The items were fielded on three samples: one convenience sample from the United States (n = `r nrow(pilot)`, Prolific opt-in panel) and two quota-based samples from the United States (n = `r nrow(yougov)`, YouGov opt-in panel with online and random digit dial recruitment, followed by quota sampling) and Germany (n = `r nrow(respondi)`, Respondi opt-in panel with online recruitment, followed by quota sampling). In the rest of the paper I will refer to the quota-based samples as representative; demographic details are available in Online Appendix A.

The element in square brackets reflects cultural norms around discussing incomes; the wording above applies to the United States, while in Germany the question referenced net monthly incomes. See @fernandez-albertos2018 and @pontusson2020 for similar adjustments.

As discussed above, because "rich" and "poor" are subjective categories, responses cannot be evaluated against objectively correct answers. However, we can probe the plausibility of responses by comparing them to the actual income distribution. Figures \ref{fig:rich_decile_graphs} and \ref{fig:poor_decile_graphs} visualize the income that representative samples of Americans and Germans think makes a household rich or poor respectively, plotted against deciles of the actual income distribution [@lis2018]. Visualizations based on percentiles are available in Online Appendix B.

In both countries, the threshold for being rich most commonly falls in the top decile of the actual income distribution. In the United States, `r round(sum(yougov$rich_percentile>=99, na.rm=T)/length(yougov$rich_percentile)*100,0)`% of respondents think of households in the top 1% of the actual income distribution, and another `r round(sum(yougov$rich_percentile>=90 & yougov$rich_percentile<99, na.rm=T)/length(yougov$rich_percentile)*100,0)`% think of households in the top decile (but below the 1% cut-off). In total, `r round(sum(yougov$rich_percentile>=65, na.rm=T)/length(yougov$rich_percentile)*100,0)`% consider households in the top third of the income distribution rich.\footnote{This includes respondents who answered \$100,000 which is just below the top third cut-off in the actual income distribution.} Among German respondents, `r round(sum(respondi$rich_percentile>=99, na.rm=T)/length(respondi$rich_percentile)*100,0)`% think of households in the top 1% of the actual income distribution, and another `r round(sum(respondi$rich_percentile>=90 & respondi$rich_percentile<99, na.rm=T)/length(respondi$rich_percentile)*100,0)`% think of households in the top decile (but below the 1% cut-off). In total, `r round(sum(respondi$rich_percentile>=66, na.rm=T)/length(respondi$rich_percentile)*100,0)`% of Germans think of households in the top third of the income distribution as rich.

Looking at perceptions of poverty, the most common decile for the poverty threshold is the bottom decile in Germany, and the second from the bottom decile in the United States. `r round(sum(yougov$poor_percentile<=10, na.rm=T)/length(yougov$rich_percentile)*100,0)`% of Americans consider people at or below the tenth percentile poor, while another `r round(sum(yougov$poor_percentile>10 & yougov$poor_percentile<=20, na.rm=T)/length(yougov$rich_percentile)*100,0)`% place the threshold in the second decile of the income distribution. In total, `r round(sum(yougov$poor_percentile<=33, na.rm=T)/length(yougov$rich_percentile)*100,0)`% consider those in the bottom third of the income distribution poor. The corresponding numbers for Germany are `r round(sum(respondi$poor_percentile<=10, na.rm=T)/length(respondi$rich_percentile)*100,0)`%, `r round(sum(respondi$poor_percentile>10 & respondi$poor_percentile<=20, na.rm=T)/length(respondi$rich_percentile)*100,0)`%, and `r round(sum(respondi$poor_percentile<=33, na.rm=T)/length(respondi$rich_percentile)*100,0)`%.

```{r rich decile graphs, fig.cap="\\label{fig:rich_decile_graphs}Estimates of threshold for rich, United States and Germany", warning=F}
options(digits=1)
us_rich_threshold_graph <- ggplot(yougov, aes(x=rich_decile)) +
  geom_bar(aes(y=..count../sum(..count..)), fill=myblue, color="black", position=position_nudge(x=5)) +
  theme_bw() +
  ggtitle("US respondents' estimates of incomes that count as rich", subtitle="Mapped to deciles of the actual income distribution") +
  theme(axis.text = element_text(size = 12), plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) +
  xlab("Actual US household income deciles") +
  ylab("Share of respondents") +
  ylim(0,0.5) +
  scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100))
ge_rich_threshold_graph <- ggplot(respondi, aes(x=rich_decile)) +
  geom_bar(aes(y=..count../sum(..count..)), fill=myblue, color="black", position=position_nudge(x=5)) +
  theme_bw() +
  ggtitle("German respondents' estimates of incomes that count as rich", subtitle="Mapped to deciles of the actual income distribution") +
  theme(axis.text = element_text(size = 12), plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) +
  xlab("Actual German household income deciles") +
  ylab("Share of respondents") +
  ylim(0,0.5) +
  scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100))
grid.arrange(us_rich_threshold_graph, ge_rich_threshold_graph)
```

```{r poor decile graphs, fig.cap="\\label{fig:poor_decile_graphs}Estimates of threshold for poor, United States and Germany", warning=F}
options(digits=1)
us_poor_threshold_graph <- ggplot(yougov, aes(x=(poor_decile))) +
  geom_bar(aes(y=..count../sum(..count..)), fill=myblue, color="black", position=position_nudge(x=5)) +
  theme_bw() +
  ggtitle("US respondents' estimates of incomes that count as poor", subtitle="Mapped to deciles of the actual income distribution") +
  theme(axis.text = element_text(size = 12), plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) +
  xlab("Actual US household income percentiles") +
  ylab("Share of respondents") +
  ylim(0,0.5) +
  scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) 
ge_poor_threshold_graph <- ggplot(respondi, aes(x=(poor_decile))) +
  geom_bar(aes(y=..count../sum(..count..)), fill=myblue, color="black", position=position_nudge(x=5)) +
  theme_bw() +
  ggtitle("German respondents' estimates of incomes that count as poor", subtitle="Mapped to deciles of the actual income distribution") +
  theme(axis.text = element_text(size = 12), plot.title = element_text(hjust = 0.5), plot.subtitle = element_text(hjust = 0.5)) +
  ylim(0,0.5) +
  xlab("Actual German household income percentiles") +
  ylab("Share of respondents") +
  scale_x_continuous(breaks = c(0,10,20,30,40,50,60,70,80,90,100)) 
grid.arrange(us_poor_threshold_graph, ge_poor_threshold_graph)
```

```{r setup rounding for in-text numbers 2, include=F}
options(digits=1)
```

We may consider some responses off the mark, either due to genuine but unusual perceptions or inattentive responding. `r round(sum(yougov$rich_percentile<10, na.rm=T)/length(yougov$rich_percentile)*100,0)`% of American and `r round(sum(respondi$rich_percentile<10, na.rm=T)/length(respondi$rich_percentile)*100,0)`% of German respondents give a threshold for rich in the bottom decile of the actual income distribution. `r round(sum(yougov$poor_percentile>50, na.rm=T)/length(yougov$rich_percentile)*100,0)`% of both Americans and Germans give poverty thresholds above the actual median household income. Both samples were collected as part of a broader survey, which means that attention checks could not be used. However, the rates of implausible responses are similar to those that could be caused by inattentive responding [@aronow2020] or trolling [@lopez2018], issues that are not unique to these survey items.

Table \ref{tab:summary_statistics} displays the mean, median, and modal estimates from all three samples. The modal guesses tend to be round numbers, and with the possible exception of the US representative sample whose 65th percentile modal guess of \$100,000 is on the lower side, these are reasonable estimates. The median responses are even more consistent, with estimates for the threshold of rich around the 85th-90th percentiles, and for poor around the 10th to 15th percentiles.

For each sample, two means are displayed: the first is based on unadjusted data, and the second is based on data after winsorizing the estimates to the 99th percentile of the actual income distribution. As the table shows, winsorizing is required for mean estimates to be plausible. This is due to high top-end estimates affecting the mean, which is one of the issues identified in @heiserman2021 as also affecting other survey items about inequality. Qualitatively, winsorizing to the 99th percentile of the actual income distribution can be interpreted as collapsing into one category everyone whose response indicates that they only consider households in the top 1% of incomes rich. If the researcher is comfortable doing this, then analysis of means becomes more viable.

```{r rich means medians modes table unadjusted and adjusted 12 lines, results = 'asis', echo=F}
options(digits=2, scipen=999)
estimates_table <- tibble('Statistic' = character(0) , 'Rich estimate' = numeric(0), 'Rich equivalent percentile' = numeric(0), 'Poor estimate' = numeric(0), 'Poor equivalent percentile' = numeric(0))

estimates_table[1,] <- list('Unadjusted Mean', mean(pilot$inc_rich_clean, na.rm=T), sum(mean(pilot$inc_rich_clean, na.rm=T)>us19$us19_quantiles), NA , NA)
estimates_table[2,] <- list('Winsorized Mean', mean(pilot$rich_topcoded, na.rm=T), sum(mean(pilot$rich_topcoded, na.rm=T)>us19$us19_quantiles), NA , NA)
estimates_table[3,] <- list('Median', median(pilot$rich_topcoded, na.rm=T),sum(median(pilot$rich_topcoded, na.rm=T)>us19$us19_quantiles), NA,NA)
estimates_table[4,] <- list('Mode', getmode(pilot$inc_rich_clean), sum(getmode(pilot$inc_rich_clean)>us19$us19_quantiles), NA,NA)
estimates_table[5,] <- list('Unadjusted Mean', mean(yougov$Q1), sum(mean(yougov$Q1, na.rm=T)>us19$us19_quantiles), mean(yougov$Q2), sum(mean(yougov$Q2, na.rm=T)>us19$us19_quantiles))
estimates_table[6,] <- list('Winsorized Mean', mean(yougov$rich_topcoded), sum(mean(yougov$rich_topcoded, na.rm=T)>us19$us19_quantiles), mean(yougov$poor_topcoded), sum(mean(yougov$poor_topcoded, na.rm=T)>us19$us19_quantiles))
estimates_table[7,] <- list('Median', median(yougov$rich_topcoded), sum(median(yougov$rich_topcoded, na.rm=T)>us19$us19_quantiles), median(yougov$poor_topcoded), sum(median(yougov$poor_topcoded, na.rm=T)>us19$us19_quantiles))
estimates_table[8,] <- list('Mode', getmode(yougov$Q1), sum(getmode(yougov$Q1)>us19$us19_quantiles), getmode(yougov$Q2), sum(getmode(yougov$Q2)>us19$us19_quantiles))
estimates_table[9,] <- list('Unadjusted Mean', mean(respondi$rich_inc, na.rm=T), sum(mean(respondi$rich_inc, na.rm=T)>de18$de18_quantile), mean(respondi$poor_inc, na.rm=T), sum(mean(respondi$poor_inc, na.rm=T)>de18$de18_quantile))
estimates_table[10,] <- list('Winsorized Mean', mean(respondi$rich_topcoded, na.rm=T), sum(mean(respondi$rich_topcoded, na.rm=T)>de18$de18_quantile), mean(respondi$poor_topcoded, na.rm=T), sum(mean(respondi$poor_topcoded, na.rm=T)>de18$de18_quantile))
estimates_table[11,] <- list('Median', median(respondi$rich_topcoded, na.rm=T), sum(median(respondi$rich_topcoded, na.rm=T)>de18$de18_quantile), median(respondi$poor_topcoded, na.rm=T), sum(median(respondi$poor_topcoded, na.rm=T)>de18$de18_quantile))
estimates_table[12,] <- list('Mode', getmode(respondi$rich_inc), sum(getmode(respondi$rich_inc)>de18$de18_quantile), getmode(respondi$poor_inc), sum(getmode(respondi$poor_inc)>de18$de18_quantile))

estimates_table_k <- kable(estimates_table, digits=1, booktabs=T, caption = "\\label{tab:summary_statistics}Summary statistics: respondent estimates of household income cutoffs for the rich and the poor", format.args = list(big.mark = ",")) %>%
  add_footnote("In accordance with cultural convention, US respondents estimate gross annual household incomes in USD; German respondents estimate net monthly household incomes in Euro. Data has been winsorized as described in main text.") %>%
  pack_rows(index = c('United States, Convenience sample, 2020' = 4, 'United States, Representative sample, 2021' = 4, 'Germany, Representative sample, 2021' = 4), bold=F, italic=T)
estimates_table_k
```

```{r setup rounding for in-text numbers 3, include=F}
options(digits=1)
```

We can also explore the performance of these items by examining their correlates. For example, people tend to perceive themselves as closer to the average earner than they truly are [@karadja2017; @fernandez-albertos2018]. If the items capture this bias, then answers should vary according to the respondent's own household income. This is in fact the case: respondents with above average incomes suggest higher thresholds both for being rich and being poor. See Online Appendix C for details.

In order to explore how sensitive responses are to mathematical complexity, the German sample answered two additional questions. After the income threshold items, they estimated the *share* of households that are rich and poor respectively. The results indicate that illogical responses arise despite the responses being restricted to not be more than 100 percent in each category, as `r round(sum(respondi$richpluspoor>100, na.rm=T)/nrow(respondi)*100,0)`% of respondents gave answers that summed to more than 100%, and an additional `r round(sum(respondi$richpluspoor==100, na.rm=T)/nrow(respondi)*100,0)`% gave responses that summed to exactly 100%, implying that society is entirely composed of only the rich and the poor. Perhaps most concerning, the respondents' education level was the main predictor both of whether a respondent gave a logically impossible response, and of how large a share of the population they saw as rich or poor. See the Online Appendix D for details. While some of this may be due to the less advantaged seeing a more bifurcated society, it also suggests that questions that reference shares are more difficult to parse for those with fewer years of education. As a result, complicated survey items may systematically misrepresent the perspectives of the less advantaged.

# Potential uses

The theory section suggested potential research uses for these items, building on the idea that by taking the ratio between the thresholds for rich and poor, we have a measure of perceived inequality. In this section, I empirically illustrate how research using these items might look.

First, we can ask how this measure of perceived inequality relates to objective inequality. For example, the median American respondent estimated a rich to poor gross annual income ratio of `r median(yougov$rich_poor_ratio, na.rm=TRUE)`; the 90:10 ratio in the actual distribution is `r us19[us19$percentiles==90,"us19_quantiles"]/us19[us19$percentiles==10,"us19_quantiles"]`. The median German respondent estimated a rich to poor net monthly income ratio of `r median(respondi$rich_poor_ratio, na.rm=TRUE)`; the 90:10 ratio in the actual distribution is `r de18[de18$percentile==90,"de18_quantile"]/de18[de18$percentile==10,"de18_quantile"]`. Alternatively, perhaps the most relevant inequality is not national, but local. For example, according to the @oecd2014, the German Bundesland with the most equal distribution of disposable income is Saxony, while the most unequal is Hessen. The overall median rich to poor ratio estimate in the German sample is `r median(respondi$rich_poor_ratio, na.rm=T)` (n = `r nrow(respondi)`), with the median estimate in Saxony at `r median(respondi[respondi$bundesland==13,"rich_poor_ratio"], na.rm=T)` (n = `r sum(respondi$bundesland==13)`) and the median estimate in Hessen at `r median(respondi[respondi$bundesland==7,"rich_poor_ratio"], na.rm=T)` (n = `r sum(respondi$bundesland==7)`).

Expanding on the observation that the subjective nature of the answers means they capture the most salient aspects of class boundaries, we can ask how perceptions of inequality are related to the shape of the income distribution. For example, runaway top end incomes could lead to the ultra-rich being a salient comparison point, resulting in higher estimates of what it takes to be rich. Suggestively, in the US where the marginal top 1% income earner makes `r us19[us19$percentiles==99,"us19_quantiles"]/us19[us19$percentiles==50,"us19_quantiles"]` times more gross per year than the median household, `r round(sum(yougov$rich_percentile>=99, na.rm=T)/length(yougov$rich_percentile)*100,0)`% of respondents estimated a threshold for rich in the top 1% of the actual income distribution. In Germany, where the marginal top 1% income earner makes `r de18[de18$percentile==99,"de18_quantile"]/de18[de18$percentile==50,"de18_quantile"]` times more net per month than the median household, `r round(sum(respondi$rich_percentile>=99, na.rm=T)/length(respondi$rich_percentile)*100,0)`% of respondents estimated the threshold for rich in the top 1% of the actual income distribution.

Finally, on a somewhat different note, the items are a way to probe who identifies as rich or poor. In the German sample, respondents were not restricted to categories when asked their household income, so their incomes can be compared to their stated thresholds. `r round(sum(respondi$hh_inc>=respondi$rich_inc, na.rm=T)/nrow(respondi)*100,0)`% of respondents have incomes at or above their self-stated threshold for being rich (the median income of these respondents was €`r median(respondi[respondi$hh_inc>=respondi$rich_inc,"hh_inc"], na.rm=TRUE)`). `r round(sum(respondi$hh_inc<=respondi$poor_inc, na.rm=T)/nrow(respondi)*100,0)`% of respondents have incomes at or below their self-stated threshold for poverty (the median income of these respondents was €`r median(respondi[respondi$hh_inc<=respondi$poor_inc,"hh_inc"], na.rm=TRUE)`).

# Discussion

In this article I characterize and suggest potential uses for two survey questions about perceptions of economic inequality. The questions are designed to be intuitive to answer: most importantly, they do not require respondents to engage with distributions (percentiles, deciles, shares, etc.). Instead, the questions ask respondents to identify the income at which a household becomes rich, and the income at which a household counts as poor. In samples collected in the United States and Germany, I show that the response patterns are generally reasonable reflections of reality. In the public's eye, the threshold for a household to be rich is generally in the 80th-90th percentiles of the income distribution, while the threshold to be poor falls in the 10th-20th percentile.

These items reverse the more common approach of first identifying an objective measure of inequality and then asking respondents to guess what that quantity is. Instead, the questions are formulated to be easy to answer and intuitive to think about, giving respondents an opportunity to express their social perceptions. It is up to the researcher to establish whether people's perceptions of differences between the rich and the poor are sensitive to specific aspects of inequality. As researchers, we can and should be precise about the definitions and measures of inequality that we study, but we can do this without expecting people to think about this concept the same way we do.

This approach has limitations. There are aspects of inequality these items do not capture, even implicitly, such as whether there are more rich people than poor people. Additionally, the responses do not have objectively true answers, because the categories "rich" and "poor" are inherently subjective. As a result, the responses to these items do not have objectively correct responses, and cannot be used to evaluate respondent accuracy. Instead, they capture salient aspects of class politics, which may shift over time and space - a property that needs to be kept in mind as the results are interpreted.

Overall, I suggest these questions are a useful tool for "meeting respondents where they are". The questions are designed to be easy to understand, and respondents give a reasonable range of responses to them. The items also have a range of potential applications. These questions may thus be a useful tool to add to the range of survey questions that we ask about inequality.

\singlespacing

\newpage

# References {.unnumbered}

::: {#refs}
:::
